Note
Click here to download the full example code
Dataset summary: Dengue¶
Report generated using dataprep.
Dataset Statistics
| Number of Variables | 14 |
|---|---|
| Number of Rows | 124897 |
| Missing Cells | 513559 |
| Missing Cells (%) | 29.4% |
| Duplicate Rows | 63027 |
| Duplicate Rows (%) | 50.5% |
| Total Size in Memory | 46.3 MB |
| Average Row Size in Memory | 388.9 B |
Variable Types
| Categorical | 9 |
|---|---|
| Numerical | 5 |
abdominal_pain
categorical
| Distinct Count | 2 |
|---|---|
| Unique (%) | 0.0% |
| Missing | 13921 |
| Missing (%) | 11.2% |
| Memory Size | 7.4 MB |
Length
| Mean | 4.7556 |
|---|---|
| Standard Deviation | 0.4297 |
| Median | 5 |
| Minimum | 4 |
| Maximum | 5 |
Sample
| 1st row | False |
|---|---|
| 2nd row | False |
| 3rd row | False |
| 4th row | False |
| 5th row | False |
Letter
| Count | 527762 |
|---|---|
| Lowercase Letter | 416786 |
| Space Separator | 0 |
| Uppercase Letter | 110976 |
| Dash Punctuation | 0 |
| Decimal Number | 0 |
age
numerical
| Distinct Count | 133 |
|---|---|
| Unique (%) | 0.1% |
| Missing | 213 |
| Missing (%) | 0.2% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Memory Size | 1.9 MB |
| Mean | 10.7163 |
| Minimum | 0 |
| Maximum | 88 |
| Zeros | 33 |
| Zeros (%) | 0.0% |
| Negatives | 0 |
| Negatives (%) | 0.0% |
Quantile Statistics
| Minimum | 0 |
|---|---|
| 5-th Percentile | 2 |
| Q1 | 6 |
| Median | 10 |
| Q3 | 13 |
| 95-th Percentile | 26 |
| Maximum | 88 |
| Range | 88 |
| IQR | 7 |
Descriptive Statistics
| Mean | 10.7163 |
|---|---|
| Standard Deviation | 7.2255 |
| Variance | 52.2085 |
| Sum | 1.3361e+06 |
| Skewness | 2.4783 |
| Kurtosis | 11.9615 |
| Coefficient of Variation | 0.6743 |
ascites
categorical
| Distinct Count | 2 |
|---|---|
| Unique (%) | 0.0% |
| Missing | 50783 |
| Missing (%) | 40.7% |
| Memory Size | 4.9 MB |
Length
| Mean | 4.9615 |
|---|---|
| Standard Deviation | 0.1924 |
| Median | 5 |
| Minimum | 4 |
| Maximum | 5 |
Sample
| 1st row | False |
|---|---|
| 2nd row | False |
| 3rd row | False |
| 4th row | False |
| 5th row | False |
Letter
| Count | 367716 |
|---|---|
| Lowercase Letter | 293602 |
| Space Separator | 0 |
| Uppercase Letter | 74114 |
| Dash Punctuation | 0 |
| Decimal Number | 0 |
bleeding_gum
categorical
| Distinct Count | 2 |
|---|---|
| Unique (%) | 0.0% |
| Missing | 50751 |
| Missing (%) | 40.6% |
| Memory Size | 4.9 MB |
Length
| Mean | 4.9652 |
|---|---|
| Standard Deviation | 0.1833 |
| Median | 5 |
| Minimum | 4 |
| Maximum | 5 |
Sample
| 1st row | False |
|---|---|
| 2nd row | False |
| 3rd row | False |
| 4th row | False |
| 5th row | False |
Letter
| Count | 368148 |
|---|---|
| Lowercase Letter | 294002 |
| Space Separator | 0 |
| Uppercase Letter | 74146 |
| Dash Punctuation | 0 |
| Decimal Number | 0 |
bleeding_mucosal
categorical
| Distinct Count | 2 |
|---|---|
| Unique (%) | 0.0% |
| Missing | 23383 |
| Missing (%) | 18.7% |
| Memory Size | 6.8 MB |
Length
| Mean | 4.8933 |
|---|---|
| Standard Deviation | 0.3088 |
| Median | 5 |
| Minimum | 4 |
| Maximum | 5 |
Sample
| 1st row | False |
|---|---|
| 2nd row | False |
| 3rd row | False |
| 4th row | False |
| 5th row | False |
Letter
| Count | 496736 |
|---|---|
| Lowercase Letter | 395222 |
| Space Separator | 0 |
| Uppercase Letter | 101514 |
| Dash Punctuation | 0 |
| Decimal Number | 0 |
bleeding_skin
categorical
| Distinct Count | 2 |
|---|---|
| Unique (%) | 0.0% |
| Missing | 57870 |
| Missing (%) | 46.3% |
| Memory Size | 4.5 MB |
Length
| Mean | 4.7434 |
|---|---|
| Standard Deviation | 0.4368 |
| Median | 5 |
| Minimum | 4 |
| Maximum | 5 |
Sample
| 1st row | False |
|---|---|
| 2nd row | False |
| 3rd row | False |
| 4th row | False |
| 5th row | False |
Letter
| Count | 317933 |
|---|---|
| Lowercase Letter | 250906 |
| Space Separator | 0 |
| Uppercase Letter | 67027 |
| Dash Punctuation | 0 |
| Decimal Number | 0 |
body_temperature
numerical
| Distinct Count | 529 |
|---|---|
| Unique (%) | 2.0% |
| Missing | 97812 |
| Missing (%) | 78.3% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Memory Size | 423.2 KB |
| Mean | 37.7906 |
| Minimum | 35 |
| Maximum | 41.5 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negatives | 0 |
| Negatives (%) | 0.0% |
Quantile Statistics
| Minimum | 35 |
|---|---|
| 5-th Percentile | 37 |
| Q1 | 37 |
| Median | 37.5 |
| Q3 | 38.5 |
| 95-th Percentile | 39.5 |
| Maximum | 41.5 |
| Range | 6.5 |
| IQR | 1.5 |
Descriptive Statistics
| Mean | 37.7906 |
|---|---|
| Standard Deviation | 0.9069 |
| Variance | 0.8224 |
| Sum | 1.0236e+06 |
| Skewness | 0.9686 |
| Kurtosis | 0.01539 |
| Coefficient of Variation | 0.024 |
gender
categorical
| Distinct Count | 2 |
|---|---|
| Unique (%) | 0.0% |
| Missing | 173 |
| Missing (%) | 0.1% |
| Memory Size | 8.3 MB |
Length
| Mean | 4.9238 |
|---|---|
| Standard Deviation | 0.9971 |
| Median | 4 |
| Minimum | 4 |
| Maximum | 6 |
Sample
| 1st row | Female |
|---|---|
| 2nd row | Female |
| 3rd row | Female |
| 4th row | Female |
| 5th row | Female |
Letter
| Count | 614116 |
|---|---|
| Lowercase Letter | 489392 |
| Space Separator | 0 |
| Uppercase Letter | 124724 |
| Dash Punctuation | 0 |
| Decimal Number | 0 |
haematocrit_percent
numerical
| Distinct Count | 1060 |
|---|---|
| Unique (%) | 2.3% |
| Missing | 77905 |
| Missing (%) | 62.4% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Memory Size | 734.2 KB |
| Mean | 39.862 |
| Minimum | 1 |
| Maximum | 74.4 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negatives | 0 |
| Negatives (%) | 0.0% |
Quantile Statistics
| Minimum | 1 |
|---|---|
| 5-th Percentile | 32.8 |
| Q1 | 36.7 |
| Median | 39.4 |
| Q3 | 42.5 |
| 95-th Percentile | 49 |
| Maximum | 74.4 |
| Range | 73.4 |
| IQR | 5.8 |
Descriptive Statistics
| Mean | 39.862 |
|---|---|
| Standard Deviation | 4.93 |
| Variance | 24.3053 |
| Sum | 1.8732e+06 |
| Skewness | 0.5428 |
| Kurtosis | 1.522 |
| Coefficient of Variation | 0.1237 |
plt
numerical
| Distinct Count | 2333 |
|---|---|
| Unique (%) | 5.0% |
| Missing | 78088 |
| Missing (%) | 62.5% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Memory Size | 731.4 KB |
| Mean | 5484.7121 |
| Minimum | 1 |
| Maximum | 268768.5 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negatives | 0 |
| Negatives (%) | 0.0% |
Quantile Statistics
| Minimum | 1 |
|---|---|
| 5-th Percentile | 27 |
| Q1 | 83 |
| Median | 159 |
| Q3 | 257 |
| 95-th Percentile | 50050 |
| Maximum | 268768.5 |
| Range | 268767.5 |
| IQR | 174 |
Descriptive Statistics
| Mean | 5484.7121 |
|---|---|
| Standard Deviation | 21859.904 |
| Variance | 4.7786e+08 |
| Sum | 2.5673e+08 |
| Skewness | 4.9072 |
| Kurtosis | 27.284 |
| Coefficient of Variation | 3.9856 |
weight
numerical
| Distinct Count | 324 |
|---|---|
| Unique (%) | 0.3% |
| Missing | 545 |
| Missing (%) | 0.4% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Memory Size | 1.9 MB |
| Mean | 32.1729 |
| Minimum | 7.2 |
| Maximum | 114 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negatives | 0 |
| Negatives (%) | 0.0% |
Quantile Statistics
| Minimum | 7.2 |
|---|---|
| 5-th Percentile | 13 |
| Q1 | 21 |
| Median | 30 |
| Q3 | 42 |
| 95-th Percentile | 58 |
| Maximum | 114 |
| Range | 106.8 |
| IQR | 21 |
Descriptive Statistics
| Mean | 32.1729 |
|---|---|
| Standard Deviation | 14.3309 |
| Variance | 205.3754 |
| Sum | 4.0008e+06 |
| Skewness | 0.695 |
| Kurtosis | 0.2718 |
| Coefficient of Variation | 0.4454 |
bleeding
categorical
| Distinct Count | 2 |
|---|---|
| Unique (%) | 0.0% |
| Missing | 61556 |
| Missing (%) | 49.3% |
| Memory Size | 4.2 MB |
Length
| Mean | 4.5694 |
|---|---|
| Standard Deviation | 0.4952 |
| Median | 5 |
| Minimum | 4 |
| Maximum | 5 |
Sample
| 1st row | False |
|---|---|
| 2nd row | False |
| 3rd row | False |
| 4th row | False |
| 5th row | False |
Letter
| Count | 289431 |
|---|---|
| Lowercase Letter | 226090 |
| Space Separator | 0 |
| Uppercase Letter | 63341 |
| Dash Punctuation | 0 |
| Decimal Number | 0 |
shock
categorical
| Distinct Count | 2 |
|---|---|
| Unique (%) | 0.0% |
| Missing | 559 |
| Missing (%) | 0.4% |
| Memory Size | 8.3 MB |
Length
| Mean | 4.9413 |
|---|---|
| Standard Deviation | 0.2351 |
| Median | 5 |
| Minimum | 4 |
| Maximum | 5 |
Sample
| 1st row | True |
|---|---|
| 2nd row | True |
| 3rd row | True |
| 4th row | True |
| 5th row | True |
Letter
| Count | 614387 |
|---|---|
| Lowercase Letter | 490049 |
| Space Separator | 0 |
| Uppercase Letter | 124338 |
| Dash Punctuation | 0 |
| Decimal Number | 0 |
dsource
categorical
| Distinct Count | 10 |
|---|---|
| Unique (%) | 0.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory Size | 8.1 MB |
Length
| Mean | 3.0035 |
|---|---|
| Standard Deviation | 1.0155 |
| Median | 2 |
| Minimum | 2 |
| Maximum | 5 |
Sample
| 1st row | 01nva |
|---|---|
| 2nd row | 01nva |
| 3rd row | 01nva |
| 4th row | 01nva |
| 5th row | 01nva |
Letter
| Count | 250201 |
|---|---|
| Lowercase Letter | 250201 |
| Space Separator | 0 |
| Uppercase Letter | 0 |
| Dash Punctuation | 0 |
| Decimal Number | 124924 |
8 9 10 11 12 13 14 15 16 17 18 19 20 | from dataprep.eda import create_report
from pkgname.utils.data_loader import load_dengue
from pkgname.utils.print_utils import suppress_stdout, suppress_stderr
features = ["dsource", "age", "gender", "weight", "bleeding", "plt",
"shock", "haematocrit_percent", "bleeding_gum", "abdominal_pain",
"ascites", "bleeding_mucosal", "bleeding_skin", "body_temperature"]
with suppress_stdout() and suppress_stderr():
df = load_dengue(usecols=features)
report = create_report(df, title="Dengue dataset report")
report
|
Total running time of the script: ( 0 minutes 4.891 seconds)